Algorithm-Directed Exploration for Model-Based Reinforcement Learning in Factored MDPs
نویسندگان
چکیده
One of the central challenges in reinforcement learning is to balance the exploration/exploitation tradeoff while scaling up to large problems. Although model-based reinforcement learning has been less prominent than value-based methods in addressing these challenges, recent progress has generated renewed interest in pursuing modelbased approaches: Theoretical work on the exploration/exploitation tradeoff has yielded provably sound model-based algorithms such as E 3 and Rmax, while work on factored MDP representations has yielded model-based algorithms that can scale up to large problems. Recently the benefits of both achievements have been combined in the Factored E3 algorithm of Kearns and Koller. In this paper, we address a significant shortcoming of Factored E3: namely that it requires an oracle planner that cannot be feasibly implemented. We propose an alternative approach that uses a practical approximate planner, approximate linear programming, that maintains desirable properties. Further, we develop an exploration strategy that is targeted toward improving the performance of the linear programming algorithm, rather than an oracle planner. This leads to a simple exploration strategy that visits states relevant to tightening the LP solution, and achieves sample efficiency logarithmic in the size of the problem description. Our experimental results show that the targeted approach performs better than using approximate planning for implementing either Factored E3 or Factored Rmax.
منابع مشابه
Discovering Hierarchy in Reinforcement Learning with HEXQ
An open problem in reinforcement learning is discovering hierarchical structure. HEXQ, an algorithm which automatically attempts to decompose and solve a model-free factored MDP hierarchically is described. By searching for aliased Markov sub-space regions based on the state variables the algorithm uses temporal and state abstraction to construct a hierarchy of interlinked smaller MDPs.
متن کاملEfficient Reinforcement Learning in Factored MDPs
We present a provably efficient and near-optimal algorithm for reinforcement learning in Markov decision processes (MDPs) whose transition model can be factored as a dynamic Bayesian network (DBN). Our algorithm generalizes the recent E3 algorithm of Kearns and Singh, and assumes that we are given both an algorithm for approximate planning, and the graphical structure (but not the parameters) o...
متن کاملChi-square Tests Driven Method for Learning the Structure of Factored MDPs
sdyna is a general framework designed to address large stochastic reinforcement learning (rl) problems. Unlike previous model-based methods in Factored mdps (fmdps), it incrementally learns the structure of a rl problem using supervised learning techniques. spiti is an instantiation of sdyna that uses decision trees as factored representations. First, we show that, in structured rl problems, sp...
متن کاملEfficient Structure Learning in Factored-State MDPs
We consider the problem of reinforcement learning in factored-state MDPs in the setting in which learning is conducted in one long trial with no resets allowed. We show how to extend existing efficient algorithms that learn the conditional probability tables of dynamic Bayesian networks (DBNs) given their structure to the case in which DBN structure is not known in advance. Our method learns th...
متن کاملTeXDYNA: Hierarchical Reinforcement Learning in Factored MDPs
Reinforcement learning is one of the main adaptive mechanisms that is both well documented in animal behaviour and giving rise to computational studies in animats and robots. In this paper, we present TeXDYNA, an algorithm designed to solve large reinforcement learning problems with unknown structure by integrating hierarchical abstraction techniques of Hierarchical Reinforcement Learning and f...
متن کامل